Skip to content

Task01 Вадим Козлов ИТМО#1042

Open
six-nine wants to merge 4 commits intoGPGPUCourse:task01from
six-nine:task01
Open

Task01 Вадим Козлов ИТМО#1042
six-nine wants to merge 4 commits intoGPGPUCourse:task01from
six-nine:task01

Conversation

@six-nine
Copy link

@six-nine six-nine commented Feb 22, 2026

Локальный вывод

$ ./main_aplusb_matrix
  Device #0: API: OpenCL. GPU. Apple M2 Pro. Total memory: 21845 Mb.
Using device #0: API: OpenCL. GPU. Apple M2 Pro. Total memory: 21845 Mb.
Using OpenCL API...
matrices size: 16384x8192 = 3 * 512 MB
Running BAD matrix kernel...
Kernels compilation done in 0.034766 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.045557 10%=0.045572 median=0.045623 90%=0.114009 max=0.114009)
a + b kernel median VRAM bandwidth: 32.8782 GB/s
Running GOOD matrix kernel...
Kernels compilation done in 0.004586 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.008698 10%=0.008725 median=0.008789 90%=0.047197 max=0.047197)
a + b kernel median VRAM bandwidth: 170.668 GB/s

Вывод Github CI

$ ./main_aplusb_matrix
...

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1042

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix ===
=== main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.57138 sec (CUDA: 0.114671 sec, OpenCL: 0.707158 sec, Vulkan: 7.74949 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
matrices size: 16384x8192 = 3 * 512 MB
Running BAD matrix kernel...
Kernels compilation done in 3.48946 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.006532 10%=0.006533 median=0.006536 90%=3.49609 max=3.49609)
a + b kernel median VRAM bandwidth: 229.498 GB/s
Running GOOD matrix kernel...
Kernels compilation done in 0.072696 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.010631 10%=0.010633 median=0.010639 90%=0.083395 max=0.083395)
a + b kernel median VRAM bandwidth: 140.991 GB/s

Посмотреть полные логи

@six-nine
Copy link
Author

Результаты тестирования PR #1042

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix ===
=== main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.57138 sec (CUDA: 0.114671 sec, OpenCL: 0.707158 sec, Vulkan: 7.74949 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
matrices size: 16384x8192 = 3 * 512 MB
Running BAD matrix kernel...
Kernels compilation done in 3.48946 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.006532 10%=0.006533 median=0.006536 90%=3.49609 max=3.49609)
a + b kernel median VRAM bandwidth: 229.498 GB/s
Running GOOD matrix kernel...
Kernels compilation done in 0.072696 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.010631 10%=0.010633 median=0.010639 90%=0.083395 max=0.083395)
a + b kernel median VRAM bandwidth: 140.991 GB/s
Посмотреть полные логи

Что-то не то, пока дебажу...

@GPUcourseBOT
Copy link
Collaborator

Результаты тестирования PR #1042

Логи тестирования (нажмите чтобы развернуть)
=== СТАТУС: Успешно выполнены программы: main_aplusb_matrix ===
=== main_aplusb_matrix stdout (exit code: -11 (segfault после выполнения)) ===
Found 1 GPUs in 8.55892 sec (CUDA: 0.113286 sec, OpenCL: 0.717539 sec, Vulkan: 7.72802 sec)
Available devices:
Device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using device #0: API: CUDA+OpenCL+Vulkan. GPU. Tesla T4 (CUDA 12020). Free memory: 14822/14930 Mb.
Using OpenCL API...
matrices size: 16384x8192 = 3 * 512 MB
Running BAD matrix kernel...
Kernels compilation done in 3.49659 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.031919 10%=0.031956 median=0.032117 90%=3.55054 max=3.55054)
a + b kernel median VRAM bandwidth: 46.7042 GB/s
Running GOOD matrix kernel...
Kernels compilation done in 0.069782 seconds
a + b matrix kernel times (in seconds) - 10 values (min=0.006273 10%=0.006279 median=0.006281 90%=0.076121 max=0.076121)
a + b kernel median VRAM bandwidth: 238.815 GB/s

Посмотреть полные логи

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants